Broadcast Audio and Video Bimodal Corpus Exploitation and Application

نویسندگان

  • Yu Zou
  • Min Hou
  • Yudong Chen
  • Fengguo Hu
  • Li Fu
چکیده

The main purpose of this paper is the exploitation and application of an audio and video bimodal corpus of the Chinese language in broadcasting. It deals with the designation of the size and structure of speech samples according to radio and television program features. Secondly, it discusses annotation method of broadcast speech with achievements made and suggested future improvements. Finally, it presents an attempt to describe the distribution of annotated items in our corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Detailed Description of the AVOZES Data Corpus

The AVOZES data corpus has recently been made publicly available for other interested researchers. It is the first publicly available audio-video speech data corpus for Australian English. It contains recordings from 20 speakers and the sequences provide both a systematic coverage of the phonemes and visemes of Australian English as well as some application-driven utterances. AVOZES is also the...

متن کامل

Video Search Engine Using Dual-media Segmentation

Most work in the area of segmentation of broadcast material has concentrated solely on the video element of the media. This paper looks at an algorithm which, in addition, utilises the audio track as a means of identifying meaningful scene breaks. This work is set in the context of a web-based video search engine that is demonstrated using broadcast news, and the effectiveness of the algorithm ...

متن کامل

Joint modality fusion and temporal context exploitation for semantic video analysis

In this paper, a multi-modal context-aware approach to semantic video analysis is presented. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest...

متن کامل

Multimedia interaction for the new millennium

Spoken language processing has created value in multiple application areas such as document transcription, data base entry, and command and control. Recently scientists have been focusing on a new class of application that promises on-demand access to multimedia information such as radio and broadcast news. In separate research, augmenting traditional graphical interfaces with additional modali...

متن کامل

CENSREC-1-AV: an audio-visual corpus for noisy bimodal speech recognition

In this paper, an audio-visual speech corpus CENSREC-1-AV for noisy speech recognition is introduced. CENSREC-1-AV consists of an audio-visual database and a baseline system of bimodal speech recognition which uses audio and visual information. In the database, there are 3,234 and 1,963 utterances made by 42 and 51 speakers as a training and a test sets respectively. Each utterance consists of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006